fix(trace): make empty/aborted runs fail loudly instead of silently "running"#7
Merged
burrows99 merged 1 commit intoJun 19, 2026
Conversation
…running" A chrome run that captured nothing and recorded nothing was reported as success (ok:true, exit 0) and could sit on the dashboard's "running" badge forever — the agent had no signal it was broken. Root cause: the failure signals lived in stderr logs or dashboard-only state, never in the JSON envelope the agent reads. This closes those gaps and the matching violations of the logger's own two-channel contract (codes.ts: the stderr log and the envelope diagnostic for one event should carry the SAME code). - Terminal envelope on abort. If a run throws (attach failed, engine crashed, recording threw), DynamicCommand now emits a terminal envelope via onProgress — no `running` flag, ok:false, an ENGINE_FATAL diagnostic — so the collector resolves the session instead of leaving its initial "running" partial orphaned. Cli flushes that emit before exiting non-zero. - Recording outcomes are now envelope diagnostics, not just stderr. RECORD_EMPTY (no frames → empty video) and RECORD (render/upload threw) were log-only, so "no video" was invisible to a --json reader. Both now push a warn diagnostic carrying the same code as the log. - Upload failure no longer masquerades as a clean local save. S3ArtifactStore swallows failures and returns null, which #record could not tell apart from "no S3 configured". It now distinguishes the two and emits an UPLOAD diagnostic when a configured upload fails (link missing, local copy kept). - ENGINE_FATAL is now logged as well as diagnosed, so it appears in both channels per the contract. Adds test/dynamic-diagnostics.test.js (fake-tracer unit tests for the abort terminal envelope, captured-fatal, and clean-empty cases). typecheck + build clean; 44/45 tests pass (1 DB round-trip skipped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
burrows99
added a commit
that referenced
this pull request
Jun 19, 2026
…ccurately (#6) * fix(collector): bound emit failure memory and word network failures accurately Follow-up to the emit-feedback diagnostics (#5), addressing two Copilot review notes: - Cli.ts kept every failed EmitResult in an array but only ever read the count and the last one. On a run that emits per onProgress, repeated failures grew that array unbounded. Replace it with a counter plus the most recent failure. - The emit diagnostic said the collector "rejected" the emits even when the POST never reached it (connection refused/timeout/DNS — no HTTP status). Word an HTTP rejection and a delivery failure distinctly. - Collector.emit returned the full rejection body, which a caller retains; a large error page could bloat memory. Cap the stored body at 10k chars (the log line still truncates to 500 independently). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(trace): make empty/aborted runs fail loudly instead of silently "running" (#7) A chrome run that captured nothing and recorded nothing was reported as success (ok:true, exit 0) and could sit on the dashboard's "running" badge forever — the agent had no signal it was broken. Root cause: the failure signals lived in stderr logs or dashboard-only state, never in the JSON envelope the agent reads. This closes those gaps and the matching violations of the logger's own two-channel contract (codes.ts: the stderr log and the envelope diagnostic for one event should carry the SAME code). - Terminal envelope on abort. If a run throws (attach failed, engine crashed, recording threw), DynamicCommand now emits a terminal envelope via onProgress — no `running` flag, ok:false, an ENGINE_FATAL diagnostic — so the collector resolves the session instead of leaving its initial "running" partial orphaned. Cli flushes that emit before exiting non-zero. - Recording outcomes are now envelope diagnostics, not just stderr. RECORD_EMPTY (no frames → empty video) and RECORD (render/upload threw) were log-only, so "no video" was invisible to a --json reader. Both now push a warn diagnostic carrying the same code as the log. - Upload failure no longer masquerades as a clean local save. S3ArtifactStore swallows failures and returns null, which #record could not tell apart from "no S3 configured". It now distinguishes the two and emits an UPLOAD diagnostic when a configured upload fails (link missing, local copy kept). - ENGINE_FATAL is now logged as well as diagnosed, so it appears in both channels per the contract. Adds test/dynamic-diagnostics.test.js (fake-tracer unit tests for the abort terminal envelope, captured-fatal, and clean-empty cases). typecheck + build clean; 44/45 tests pass (1 DB round-trip skipped). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(collector): bound the rejection body at the stream, add emit-diagnostic tests Addresses the two Copilot review notes on the PR: - Collector.emit buffered the whole rejection body via response.text() before slicing, so an oversized error page still caused a transient memory spike and download latency even though the stored body was capped. Read only up to MAX_BODY_CHARS off the stream, then cancel it — bounding memory and latency at the source. - Extract the emit-failure diagnostic wording into an exported emitFailureMessage() helper and cover it with focused tests: HTTP status → "rejected", no status (network) → "failed" (the regression guard so a POST that never landed isn't reported as a rejection), plus the no-body / missing-error / oversized-body edge cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(collector): flush the TextDecoder so a split multi-byte char isn't dropped readCappedText decoded chunks with { stream: true } but never did a final decode() flush. A response ending on a multi-byte UTF-8 sequence split across the last chunk boundary would drop/garble that trailing character in the stored body and logged reason. Flush after the read loop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
burrows99
added a commit
that referenced
this pull request
Jun 19, 2026
…#8) Keep the prose docs in step with the behavior changes from the collector/trace fixes (#5–#7). - README: the docker-compose section showed the container ports (:4747 · :5432 · :9000) instead of the host-published ports the compose file actually maps. Correct to :14747 · :65432 · :19000 (console :19001), and the S3_ENDPOINT example to :19000, so the README matches docker-compose.yml. The native-serve example keeps :5432 (your own Postgres) — that one was right. - skills/trace/SKILL.md: add a "Reading the result" section. The skill is deliberately minimal (the binary is the source of truth), but it gave zero guidance on interpreting a trace — the gap behind an agent treating an empty or perpetually-"running" trace as success. Names the three stable fields that decide whether a run did what was asked (ok · diagnostics[].code · meta.running) and the rule that ok:true + events:[] + a warn diagnostic means "ran fine, saw nothing — re-aim," not success. Field-level (no drift-prone code list).
burrows99
added a commit
that referenced
this pull request
Jun 19, 2026
The DX-gaps analysis enumerated six ways the CLI misled an agent; all six have since shipped. Keep the analysis verbatim as the rationale, but add a status banner + table and a per-gap resolution note pointing to where each was closed (and flagging the few sub-items still open: dashboard age-out of stale "running" (#6b), done/upload decoupling (#6c), and lockfile-based port auto-discovery (#5)). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
From the dashboard screenshot in the last session: a
run.chromesession pinned at the RUNNING badge with 0 events, no video, no lineage — and the agent never flagged it, because the CLI told it the run succeeded. Every signal that would say "this trace is broken or incomplete" was a non-blocking warning, a stderr log, or dashboard-only state — never in the--jsonenvelope the agent reads.This is also a violation of the logger's own contract (
src/shared/codes.ts): the stderr log and the envelope diagnostic for one event are supposed to carry the same code, so you can join "what I saw while it ran" to "what landed in the result." In practice the recording/upload failures were log-only and invisible to the result.Changes
1. Terminal envelope on abort —
DynamicCommandrunningpartial orphaned in the dashboard forever. The run now emits a terminal envelope viaonProgress— norunningflag,ok:false, anENGINE_FATALdiagnostic — so the collector resolves the session.Cliflushes that emit before exiting non-zero.2. Recording outcomes are envelope diagnostics, not just stderr —
DynamicCommand.#recordRECORD_EMPTY(no frames → empty video) andRECORD(render/upload threw) werelog.warn/log.erroronly, so "no video" was invisible to a--jsonreader. Both now also push awarndiagnostic with the same code as the log.3. Upload failure no longer masquerades as a clean local save —
DynamicCommand.#recordS3ArtifactStore.uploadswallows failures and returnsnull, which#recordcouldn't tell apart from "no S3 configured" — both became a tidy "saved locally". It now distinguishes them and emits anUPLOADdiagnostic when a configured upload fails (link missing, local copy kept).4.
ENGINE_FATALlogged as well as diagnosed — so it appears in both channels per the contract.The logging audit that motivated this (the "any violations?" answer)
Cross-referencing every
log.*andDiagnostic.*call site against thecodes.ts"same code in both channels" contract:RECORD,RECORD_EMPTY,UPLOADCHROME(→ now ENGINE_FATAL via abort path),LSP(→CODEGRAPH_FAILEDdownstream)CODEGRAPH_FAILED,EMIT,ENGINE_FATAL(now)STEP_FAILED,BP_UNBOUND,BP_BOUND_UNHIT,GRAPH_TRUNCATED,COMPLEXITY_HIGH,DEPS_CIRCULAR,TOOL_MISSINGEMIT(logerror, diagnosticwarn)Also checked and clean: stdout discipline — the logger is stderr-only (
logger.ts), and everyprocess.stdout.writeinsrc/is a legitimate data/human channel write, so--jsonis never polluted.Not fixed here (bigger / out of scope, tracked as recommendations): dashboard "age out stale
running" UI net, and decoupling the final "done" emit from the S3 upload so a slow upload can't delay the running→done transition.Tests
test/dynamic-diagnostics.test.js(new) — fake-tracer unit tests: thrown run → terminal envelope (runningcleared,ENGINE_FATAL); captured fatal →ok:false; clean-empty node trace →ok:true, not running.tsc --noEmitclean ·npm test→ 44/45 pass (1 DB round-trip skipped).Base
Stacked on
fix/collector-emit-bounded-memory(PR #6) — its Cli.ts edit is adjacent to this one's. Retarget tomasteronce #6 lands.🤖 Generated with Claude Code